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ABSTRACT 

This paper gives an overview of Demaq, an XML message 
processing system operating on the foundation of transac- 
tional XML message queues. We focus on the syntax and 
semantics of its fully declarative, rule-based application lan- 
guage and demonstrate our message-based programming para- 
digm in the context of a case study. Further, we discuss op- 
timization opportunities for executing Demaq programs. 

1. INTRODUCTION 

The Web is rapidly developing from a one-way medium 
into an active distributed system, where the participating 
nodes asynchronously communicate via XML messages. Ex- 
amples for "Active Web" protocols include event notification 
using RSS/Atom feeds [27], business process automation us- 
ing Web Services O [21], and even new end- user interface 
architectures such as A J AX [18]. Industry sectors as di- 
verse as securities trading [25) and multi-media news distri- 
bution [12] have successfully introduced XML messaging as 
the foundation of their processes. 

Today's systems usually implement these protocols as an 
additional tier on top of existing middleware solutions [2], 
further aggravating the problem of complexity and poor in- 
tegration that already plagues these systems [30] : Typically, 
the actual business processes are specified using imperative, 
high-level languages such as Java, C# or C-I--I-. An incoming 
message travels through the various layers: The XML body 
of the message is transformed into the middleware's rep- 
resentation, again transformed into the programming lan- 
guage's representation, with further transformations thrown 
in as other components such as relational DBMSs are ac- 
cessed. Delivering a result requires a reverse traversal of this 
"transformation chain". This not only hurts performance, 
but also reduces developer productivity because each layer 
requires at least some separate design and coding that is 
not related to the actual application domain. Asynchronous 
operation and dependability requirements add more depen- 
dencies and complexity. The interaction of various configu- 
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ration options and code fragments is difficult to understand, 
optimize, and maintain. 

The "Demaq" (DEclarative Messaging And Queuing) pro- 
ject described here investigates an alternative approach for 
the specification and implementation of such systems. Our 
work is based on a simple model: Essentially, the processes 
in the whole network as well as the processes on the indi- 
vidual nodes can be represented as a set of XML message 
queues and a set of rules for message exchange between these 
queues. The behavior of any node (or group of nodes) can 
be completely specified by enumerating its queues and asso- 
ciated rules. 

The core idea of Demaq is to use a fully declarative, ex- 
ecutable rule language for specification and implementation 
of Active Web nodes. This allows to move the responsibil- 
ity for implementation details from the programmer to the 
processing system. This increases productivity if the typical 
asynchronous "dequeue-process-react" processing model for 
message- driven applications becomes part of the language 
semantics. Declarativity also facilitates data independence, 
which in the case of message processing means that aspects 
such as message persistence and recovery, message retention, 
streamed or materialized representation, or transport proto- 
cols are transparent to the programmer unless their control 
is explicitly desired. These degrees of freedom can be used 
by the processing system to automatically optimize the ex- 
ecution of the application. Last but not least, declarativity 
simplifies reasoning about the properties of the system [13] . 
both on the level of individual nodes and whole systems. 

We want to answer the question whether such a fully 
declarative language for XML message processing is viable 
from a systems perspective. Our Demaq server realizes an 
Active Web node by executing a declarative program. It 
leverages database technology for message processing by us- 
ing XML data stores for reliable, transactional XML mes- 
sage queues, and declarative XML query processing technol- 
ogy for efficient rule evaluation. This has never been done 
before: Existing approaches in this problem space either (1) 
work on network layers below the application |24) . (2) focus 
on relational queue persistence [151 119j and do not provide 
a control language that natively supports XML, making the 
implementation of above-mentioned protocols a pain, (3) are 
partly imperative (e.g. [T] [3l [TT] 1171 134[ ). making an auto- 
matic optimization difficult, or (4) only consider isolated 
subproblems, omitting a specification of a complete system 
architecture [SI 1911^. 

Our main contribution is the fully declarative, executable 
language Demaq for XML message processing. It is based 
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Figure 1: Structure of a Demaq application 

on existing and emerging standards sucli as XML Scliema 
[31], XQuery 0, and tlie XQuery Update Facility [10]. The 
Demaq language has only a few primitives, which can be 
divided into two sublanguages: 

A queue definition language for the specification of 

• queues for representing local state, 

• queues for the communication with remote nodes, 

• virtual queues, called slices, which group mes- 
sages according to user-defined criteria, and 

• message retention policies. 

A rule language for message flow control, which 

• is "closed", i.e. describes the system's reaction to 
messages exclusively in terms of new messages, 

• extends the XQuery Update language 10 with 
queuing primitives, and 

• has simple semantics based on an execution model 
that suggests a straightforward implementation, 
but leaves room for automatic optimization. 

We complement the language description with many ex- 
amples and a discussion of our ongoing implementation, 
which shows 

• why our language can be executed in a reliable, effi- 
cient and scalable way, 

• how existing database technology can be leveraged for 
message processing (exemplified by our native XML 
data store Natix [16|). 

• where the opportunities for further research are. 

Fig. [Ijvisualizes a Demaq application (top pane) . The fig- 
ure also illustrates the outline of the paper: An application 
consists of queue definitions specified using the Queue Def- 
inition Language (QDL), including communication queues 
to the outside world (bottom pane and Sec. [2]| , and Queue 
Manipulation Language (QML) application rules for mes- 
sage fiow (middle pane and Sec. |3]). Demaq applications 
are executed by the Demaq server (shown on the right and 
discussed in Sec. U}. 



2. MESSAGES AND QUEUES 

Demaq applications are based on an infrastructure of phys- 
ical and logical structures defined by the Queue Definition 
Language (QDL) . Queues provide physical message contain- 
ers that decouple message insertion from processing and pro- 
vide fast message storage and retrieval operations as well as 
support for reliability and communication (Sec. I2.1|l . We 
also introduce the notion of slices, which are used for cre- 
ating logical groups of related messages. We use slices to 
simplify application development and to specify how long 
messages need to be retained physically before they can be 
deleted (SeclQ)!. 

In the QDL definitions in this section, the names of struc- 
tures are always qualified XML names. For brevity, we 
assume the declaration of a default namespace and omit 
namespace prefixes. 

2.1 Queues 

To provide application programs with high performance 
communication facilities, Demaq incorporates data struc- 
tures for efficient, asynchronous messaging operations. For 
this purpose, message queues have proven their usefulness 
in a vast number of messaging and integration solutions [2]. 
They allow for fast message handling operations and directly 
support the asynchronous processing model of Active Web 
applications. Consequently, queue data structures are used 
for all message handling operations in Demaq. 

Apart from the simplest solutions, distributed applica- 
tions have to keep track of their execution state, for example 
to reflect their current progress with respect to the business 
process they implement. Usually, large parts of the state 
information of an application program is derived from the 
messages sent to and received from remote communication 
partners. 

One possibility to represent this information is to asso- 
ciate a corresponding runtime context to each application 
instance. This approach is used by BPEL [3] and XL [17) . 
where instance-local variables can be used for storing state 
information. Contexts that include these variable bindings 
have to be kept for each active process instance, which leads 
to scalability issues if the number of processes is large. Some 
execution systems try to overcome this problem by serializ- 
ing data (dehydration) of "stale" instances. For example, 
the Oracle BPEL Process Manager stores application con- 
texts in a relational database system (dehydration store) 
and reacquires them when processing continues 7 . 

Demaq chooses another approach by requiring all data to 
be available as XML messages, each of which resides in ex- 
actly one queue. As a result, messages received from remote 
communication endpoints and internal state information are 
handled in a uniform manner, thus simplifying application 
development. Furthermore, by modeling the state of pro- 
cesses as regular data, we can leverage declarative query 
processing to obtain the relevant data instead of constantly 
loading, manipulating, and saving opaque, monolithic run- 
time contexts. 

For reasons of uniformity, system services providing re- 
mote communication facilities and timers are also modeled 
as message queues. This greatly reduces the number of prim- 
itives in the language and makes it easier to understand and 
use. 



2.1.1 Basic Queues 

Basic queues provide local message storage facilities. Ev- 
ery queue has a unique name and mode of operation, speci- 
fying whether its content has to be stored persistently or can 
be transient. The persistent queue mode guarantees that 
in case of a system crash, messages are not lost, which is 
important for business processes. Transient queues may be 
used in those parts of an application that tolerate data loss 
or can compensate for it. The following statement creates a 
persistent, local queue. 

create queue finance kind basic mode persistent 

For all queues, there may also be additional, optional pa- 
rameters, e.g. for specifying a schema all queued messages 
have to conform to. Other parameters include a priority 
level that determines the relative importance of processing 
messages from this queue compared to other queues. 

2.1.2 Gateway Queues 

Interaction with remote transport endpoints is a frequent 
operation for distributed applications such as Web Services. 
To provide application programs with a convenient way for 
remote communication, gateway queues are used to perform 
messaging operations. They represent local links to remote 
endpoints. Messages that are placed into outgoing gate- 
way queues are sent, while incoming gateway queues contain 
messages that have been received from other nodes. Mes- 
sage properties (see Sec. I2.2|l are used to specify recipients 
and other communication parameters. 

The following example creates an outgoing gateway queue 
to communicate with an external supplier's Web Service. In 
addition to the queue name and the kind, we import the 
supplier's interface definition from a WSDL file and asso- 
ciate some Web Service extensions (WS-ReliableMessaging 
[6] and WS-Security RTj) with the queue. 

create queue supplier kind outgoingGateway mode persistent 
interface supplier . wsdl port CapacityRequestPort 
using WS —ReliablcMcssaging policy wsrmpol . xml 
using WS —Security policy wssecpol . xml 

Note that in order to use the reliable messaging extensions 
which support reliable sending across system failures, the 
created queue must be persistent. 

From the point of view of the application rules (see below) , 
there is no difference between gateway queues and regular 
queues. This also facilitates the distribution of applications 
over several nodes by replacing local queues with pairs of 
gateway queues that connect two sites. 

By introducing gateway queues, all network-related op- 
erations can be implemented by a communication subsys- 
tem providing a queue-based interface. Sending a message 
to a remote transport endpoint can be done by inserting 
it into a corresponding gateway queue. Unfortunately, dis- 
tributed architectures such as Web Service applications in- 
volve plenty of heterogeneous, independent clients and soft- 
ware layers, each of them being a potential source of errors 
[33j . Thus, communication aspects and network failure noti- 
fications cannot be hidden behind a queue-based messaging 
interface. Instead, application programs have to become 
aware of problems encountered by the communication sub- 
system and implement corresponding error handlers. We 
postpone an overview of Demaq's error handling strategy 
until Sec. 13.61 



2.1.3 Time-Based Queues 

An important aspect of automated business processes is 
time. This is true, for example, in situations where the ab- 
sence of action should cause messages to be sent, e.g. for 
notifications or reminders. Similar to network communica- 
tion, we want to allow timer events without complicating the 
rule language. Thus, we model time-based events as mes- 
sage queues. An example of such time-based queues are echo 
queues, which enqueue any message sent to them into some 
target queue after a timeout has expired. Both the timeout 
and target queue are specified as message properties. 

The following example creates a persistent echo queue. 

create queue cchoQueue kind echo mode persistent 

2.2 Message Properties 

Apart from their XML payload, incoming messages are 
associated with metadata properties such as their size and 
a message arrival timestamp. Another kind of metadata is 
transport protocol information, such as the initial sender 
address or connection handles. Connection handles sup- 
port synchronous communication, where a response message 
must be correlated with an existing connection created by 
an incoming request. Metadata access has to be possible 
from application programs, e.g. if an acknowledgment mes- 
sage needs to include the arrival timestamp of the original 
message. 

A straightforward solution for metadata representation is 
the semi-structured nature of the XML message format. We 
could encapsulate the application-specific XML payload in a 
metadata-carrying XML envelope, embedding all metadata 
information into the message bodies themselves. As a re- 
sult, both payload and related metadata information were 
stored uniformly and could be easily accessed by application 
programs. 

We have chosen a different approach in Demaq because a 
fully uniform treatment of message body and message meta- 
data does not take into account that the access patterns for 
payload and metadata differ: 

• Some metadata is maintained by the system and can- 
not be freely modified. 

• Things like connection handles should automatically 
propagate with the messages. 

• Some metadata may be computable from the message 
content. 

Embedding metadata into the message bodies would put 
the burden of metadata management on the application de- 
veloper, who would have to worry about which part of the 
messages are part of the application specific schema, which 
parts are automatically maintained by the system, which 
parts must be propagated to new messages, etc. For these 
reasons, we have included primitives for property manage- 
ment into Demaq. Properties are key/value pairs, with 
unique names and a typed, atomic value. They are deter- 
mined during message creation and remain fixed over the 
message's lifetime. There are several ways to establish prop- 
erty values: 

Explicit A property value may be explicitly specified when 
enqueuing a message (see Sec. 13. 4p . 



System Certain properties are set by the system, such as 
the name of the rule that created a message, the time- 
stamp at which a message was created, or the sender 
of a message for incoming gateway queues. 

Inherited Properties may be inherited. A message whose 
creation was triggered by another message will have 
the same property values as the triggering message for 
all inherited properties. 
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Computed A property value may be computed using 
XPath expression applied to the corresponding n 
sage. 
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As an example, we define a boolean property which may 
be used by messages in four different queues. It is automat- 
ically propagated from message to message if not explicitly 
set to a different value. 

create property isVIPoi dcr as xs: boolean inherited 
queue crm, finance, legal, customer value false 

We may also define properties that have different com- 
puted values based on which queue the message is inserted 
into, as in the example below. Here, "orderlD" always takes 
the computed value and may not be set explicitly (keyword 
fixed). Note that the value expressions need not be con- 
stants, but may be path expressions which are evaluated 
against the message body. This mechanism can be used to 
give a name to common subexpressions in rules, similar to 
views in SQL. 

create property oideilD as xs: string fixed 
queue order value //orderlD 

queue confirmation value /confirmedOrder/ID 

2.3 Slicings 

Queues represent the most important primitive to physi- 
cally organize message storage. However, many applications 
have multiple, orthogonal criteria for categorizing messages, 
e.g. all messages belonging to a single business transaction, 
all messages received from a particular customer, all pri- 
ority orders, etc. Typically, messages from several different 
queues may be part of the same logical group. This aspect is 
depicted in Fig. 12.31 showing four individual business trans- 
actions. Each transaction consists of correlated messages 
stored in three different, physical queues (requests, orders, 
confirmations) . 

Demaq supports a declarative specification of such logical 
groups in the form of "virtual" queues, called slices. Slices 
are similar to the concept of parameterized views [32] for 
relational databases. There, a parameterized view defines a 
family of views, specified by a query expression with a free 
variable, such that there is one view relation for each value of 
the parameter. Similarly, a slicing in Demaq defines a family 
of virtual queues, where each virtual queue consists of all 
the messages with the same value for a particular property 
[slice key). As rules can be attached to slices, this allows for 
elegant specification of a variety of typical design patterns. 

• Slices are a more general form of the " correlation sets" 
in BPEL [3] and "conversations" in XL [17]. 

• Slices are also useful if the existence of more than one 
message is a prerequisite for generating a new message. 
For example, "joining" parallel control flows can be 
implemented by defining a slice, as demonstrated by 
the example in Sec. 13.5.11 
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Figure 2: Slicing example (customer transactions) 

• Whether or not a message is still required by some 
process may depend on other messages. Slices allow 
to group messages to specify retention policies (see 
Sec.ESSJ. 

In general, slices are user-defined granularities of data that 
are implied by the application domain. This not only al- 
lows to simplify application specification, but gives rise to 
many optimization opportunities: Despite their logical na- 
ture, slices can be physically stored to speed up message 
access, similar to indexes and materialized views. They also 
can be used as an additional locking granularity to increase 
concurrency. We will detail the semantics of slicings in this 
section and discuss implementation techniques in Sec. [4] 

2. 3. 1 Slicing Definition 

A slicing is created by specifying a unique name and the 
slicing property, which may be any property according to 
Sec. El 

The property definition lists a number of queues on which 
the property is defined. Messages from these queues are 
partitioned into slices according to their property value. All 
messages that share the same value of the slicing property 
are part of the same slice. Each slice represents a "virtual 
queue" to which rules can be attached (see Sec. I3.5.1|l . 

The property value of a slice is called the slice key. The use 
of property values as slice keys avoids extra language primi- 
tives and reflects the fact that the ways property values are 
defined nicely match the criteria according to which appli- 
cations need to group messages: Slice keys sometimes need 
to be computed from the message (using computed property 
values). In other cases, the rule creating a message might 
want to specify the target slice by explicitly setting the slice 
key. Some messages should belong to the same slice as the 
message which caused their creation (inherited properties). 
The use of system properties such as connection handles en- 
ables the application to group all messages caused by one 
particular external message. 

In the following example, we group all order and confir- 
mation messages for the same orderlD into a single slice. 
The slicing "orders" is defined on the property orderlD in- 
troduced in Sec. 12.21 



create slicing orders on orderlD 

2.3.2 Slice Resets 

Sometimes, slices have more than one "lifetime". For ex- 
ample, an application for a domain name registrar may have 



a slicing based on the domain name to group all messages 
related to a particular domain name. If at some point a 
domain name changes owners, the application might want 
to avoid accessing messages related to the old owner when 
using the domain slice. 

To indicate that an application is no longer interested in 
the content of a particular slice, slices can be "resetted", 
beginning a new lifetime of the slice. Only messages of the 
current lifetime are visible when accessing the slice. Slightly 
more formal, a slice s with a slicekey k contains all those 
messages that have a slicing property value of k and have 
been added after the last reset operation of s. 

As slices represent logical groups of messages, resetting a 
slice does not necessarily mean the contained messages are 
physically deleted. We illustrate how the slicing concept is 
related to message retention in the following section. 

2.3.3 Message Retention 

In typical message processing systems, messages are de- 
queued, i.e. physically removed, from the queue once they 
have been processed. However, there are many reasons why 
messages need to be retained for a longer period of time. Ex- 
amples include legal reasons, auditing, and tracing system 
behavior. In the Demaq model "everything is a message", 
and the state of running processes is often encoded in "old" 
messages scattered throughout the system. Hence, access to 
already processed messages is frequently required. 

This is why Demaq is based on an append-only approach 
for message queues - messages are never modified after they 
have been created. However, we still need a mechanism 
for message removal because we cannot assume unlimited 
storage capacity. One straightforward solution is to allow 
for explicit deletion by the application program. This is 
the equivalent of manual memory management for conven- 
tional programming languages, which is a chronic source of 
errors and increases the complexity of development because 
dependable long-running processes like our Active Web ap- 
plications need to be free of memory leaks or, in our case, 
"message leaks". Further, explicit deletion leaves no degrees 
of freedom for the run-time system to optimize execution, 
which is inconsistent with our desire for declarativity. In ad- 
dition, it is often difficult for a single part of the application 
code to decide whether a message is not required any more. 
To illustrate this, consider a a procurement application as a 
simple example for several independent retention criteria: 

1. From the packaging department's point of view, it is 
sufficient to retain an order message until the order 
has been packed and picked up by a delivery service. 
Afterwards, the message can be safely deleted. 

2. The corporate finance department requires the same 
order message to be retained to make sure that pay- 
ments received from the customer can be correlated to 
the corresponding order message. 

3. The operations research department requires all order 
messages to be kept until the end of the month when 
performing demand planning based on the most recent 
customer orders. 

When using explicit message deletion primitives in this sce- 
nario, the multiple retention requirements cannot be easily 
combined. In particular, the order in which the three condi- 
tions for safe message deletion become true varies from order 
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Figure 3: Procurement scenario workflow 

to order. Thus, all modules would need to know about the 
message retention policy of the other parts of the applica- 
tion, making application maintenance more difficult. 

Demaq uses a declarative approach to specify when mes- 
sages are no longer required. QML (Sec. [3} does not include 
an explicit "destroy message" primitive. Instead, the exe- 
cution engine only marks whether a message has been pro- 
cessed or not. This way, physical cleanup is decoupled from 
message processing and can be done separately, for example 
in times of low system load or when the remaining storage 
capacity becomes low. 

It turns out that the slicing concept is ideally suited to 
identify those messages that are still required by the appli- 
cation logic, because 

1. Slices are specified declaratively 

2. Slices by design represent contexts in which messages 
are useful 

3. A message may belong to any number of slices 

This is why we chose to couple membership in slices to mes- 
sage retention. The Demaq execution model guarantees that 
a message is not physically removed from the message store 
as long as it is contained in at least one slice. By reset- 
ting slices (Sec. I2.3.2|l . applications can indicate that cer- 
tain "old" messages are no longer interesting with respect 
to that slice. Once all slices to which a message belongs 
have been reset, it is eventually removed by the system. 
Messages which are not part of any slice may be deleted for 
the message store as soon as it has been processed. 

3. QUEUE MANIPULATION LANGUAGE 

The Queue Definition Language introduced in the last sec- 
tion allows the specification of the queue and slicing infra- 
structure used by an Active Web application. To implement 
the application logic, it has to be complemented by a conve- 
nient programming language. For this purpose, current ap- 
plication servers [2] successfully rely on imperative, object- 
oriented programming languages such as Java, C#, or Vi- 
sual Basic. As these languages do not incorporate XML as a 




Figure 4: Procurement scenario message flow 



first class data type, auxiliary processing steps are required 
to convert between the type systems of those languages and 
the XML format, thus harming the overall system perfor- 
mance. Recent efforts, such as Microsoft's LINQ project 
[26] or XJ [22] . try to overcome this impedance mismatch. 
However, even if XML can be successfully integrated as a 
first class data type, other disadvantages persist. For ex- 
ample, the optimization potential of imperative languages 
is far lower than that of declarative languages. Further, it is 
generally agreed upon that the productivity of programmers 
is higher for declarative languages than for imperative ones. 

To avoid these drawbacks, we choose a declarative lan- 
guage providing native XML support as the basis of our 
application language. Our Queue Manipulation Language 
QML is built on the concept of event-condition-action (EGA) 
rules that react to a single kind of event, the arrival of an 
XML message in a queue. Each QML rule is essentially 
an XQuery Update Facility expression [10] attached to a 
queue. Within this section, we first describe the execution 
model that explains how and when rules are evaluated, be- 
fore briefly recapitulating the features of the XQuery Update 
Facility. In the remainder of this section, we develop our 
QML by implementing parts of an exemplary business pro- 
cess (depicted in Fig. |3j. This process is a distributed pro- 
curement scenario taken from the chemical industry, where 
multiple parties, both within the same organizational unit 
and on external systems, form a processing network. We 
provide QML code for some of the steps in the scenario, 
creating the message flow shown in Fig. |4] 

3.1 Execution Model 

Following the terminology proposed by Paton and Diaz 
[29] . our model applies an iterative cycle policy, relying on a 
detached coupling mode that decouples the processing of a 
message from its creation. At any given point in time, there 
may be any number of messages in a node's queues that 
have not been processed yet. Each unprocessed message is 
processed exactly once, in an order determined by a message 
scheduler. The scheduler is influenced by optional priorities 
declared for the involved queues. 

The processing of a message consists of the evaluation of 
all rules that pertain to the queue in which the message 
resides. The evaluation of each rule results in a (possibly 
empty) list of pending actions. The resulting actions are 
state-updating primitives, with the most frequent action be- 
ing the creation of a new message and its insertion into an- 
other queue. This list of actions is then executed, and any 
created messages are made known to the scheduler. The 



evaluation of all rules and the subsequent execution of all 
actions caused by processing a single message are executed 
in a single transaction against the message store. Many of 
such message-processing transactions may run concurrently 
to improve performance, as long as they can be executed in 
an isolated manner. The separation of rule evaluation from 
action execution, together with the transaction mapping, 
ensure a snapshot semantics that facilitates optimization. 

3.2 XQuery Update Facility 

The XQuery Update Facility 10. is intended to allow a 
declarative specification of updates in XML data stores. The 
bulk of the proposed update language is made up by XQuery 
1.0. In addition, new primitives (do statements) are intro- 
duced which can be used similar to constructors to represent 
pending intra-document updates. Expressions which return 
such pending updates are called "updating expressions" and 
can be combined using existing XQuery constructs such as 
FLWOR expressions. Updating expressions produce a pend- 
ing update list of update primitives that are applied after 
the entire statement has been evaluated, thus resulting in a 
snapshot semantics for expression evaluation. 

Using the XQuery Update Facility as the basis for QML 
has several advantages: 

1. Application developers can benefit from previous pro- 
gramming experience with XQuery. 

2. Obviously, the XQuery language is well equipped to 
process XML data and to construct new XML frag- 
ments / messages. 

3. The implementation of Demaq becomes easier, as we 
can reuse existing XQuery implementations. 

3.3 Rule Creation 

The top-level construct in QML is a rule definition. Each 
rule has a name (RName) and associates an updating XQuery 
Update expression with a physical message queue or a slicing 
(QName). 

create rule RNainc for QNanic CondExpr 

The body of the rule is always a conditional expression 
(hence the CondExpr nonterminal) to facilitate the detec- 
tion and optimization of conditions by the rule compiler. 
The CondExpr must always be an updating expression, mak- 
ing use of the novel queue do primitives explained below. 
For programming convenience, we allow the "else" part of 
the CondExpr to be absent and assume that such a rule 
produces an empty update list in the else case. 

3.4 Message Queue Primitives 

The result of a rule is a list of pending actions, as ex- 
plained above. The most important action resulting from a 
rule is the creation of a new message and its insertion into 
a particular queue. To perform this operation, we extend 
the list of XQuery update's expressions with a new enqueue 
update primitive 

do enqueue ExprSingle into QName 

(with PiopNainc value ExprSingle)* 

which causes a message to be enqueued to the specified 
queue. The optional with clause allows to explicitly set 
properties of the new message (see Sec. 12. 2p . 



create rule newOfferRequest for crin 
if (//offcrRequest) then 
let Scustomcilnfo : — 

<rcqucstCustomcrInfo> 

{//requestID} {//customerlD} 
</requestCustomerInfo> 
let ScxportRcstrictionlnfo :— ... 
let SplantCapacitylnfo :— ... 

return do enqueue Scustomcrlnfo into finance , 

do enqueue ScxportRcstrictionslnfo into legal , 
do enqueue SplantCapacitylnfo into supplier 
with Sender value " http : //ws . chem. invalid/" 

Figure 5: Message handling and content access 

Example 3.1: The QML rule in Fig. [5] demonstrates 
liow basic message iiandling and parallelism are performed 
as a response to the reception of a new customer request 
at the customer relationship management (crm) queue of 
our running example (Fig. [5}. It initiates three subsequent 
checks by sending messages to other queues. 

The default evaluation context of all XQuery and XPath 
statements in a QML rule is the document root of the trig- 
gering message, thus " //offer Request" matches all offcrRe- 
quest elements in the incoming message. In this example, we 
create the content of three new XML messages using let, 
including the initial request ID and customer ID for mes- 
sage correlation. As depicted by Fig. |4l these messages are 
sent to the "finance", "legal" and "supplier" queues using 
the enqueue update action, forking the control flow. To al- 
low message correlation by the supplier's service, we add the 
"Sender" property field of the corresponding message using 
the with-value statement. This metadata information is 
automatically interpreted by Demaq's communication sub- 
system. 

The QML features a small library of functions (designated 
by the namespace prefix qs : ) to access messages and queues 
from QML rules. The document node of the currently pro- 
cessed message is returned by qs :message () . Access to the 
document nodes of all messages in a particular queue is pro- 
vided by (qs : queue (name) ). Message properties can be ob- 
tained using the qs : property (pname) function, returning 
the value associated with the "pname" key. 

Example 3.2 The rule in Fig. [S] demonstrates how to 
formulate predicates which involve both the current message 
and other messages in a particular queue. To determine the 
customers credit rating, the "invoices" queue is inspected to 
find potentially unpaid bills. The messages in the invoices 
queue are accessed using the qs : queue function. To access 
the content of the triggering message, qs : message () can be 
used (e.g. for correlating the contained customerlDs in the 
predicate check). 

3.5 Slice Primitives 

3.5.1 Slice Rules 

Rules can be attached to slicings using the same syntax 
as for queues. Note that a slicing specifies many "virtual 
queues" (slices), and the rule is attached to every slice of 
the specified slicing. 

3.5.2 Slice Access 

Like queues, slices are made accessible through additional 
XQuery functions. The qs: slice () function returns all 



create rule checkCreditRating for finance 
if (//requestCustomerlnfo ) then 
let Sresult : — 

<customerInfoR,esult> {//requestID} {//customerlD} 
{let Sinvoices ;— qs ;queue('' invoices" ) 
return 

if ( Sinvoices [/ /customerlD — qs ; message( ) /customerlD] ) 
then 

<refusc/> (: unpaid bills!:) 
else 

<accept/>} 
</customcrlnfoR,esult> 
return do enqueue Sresult into crm 



Figure 6: Queue access 



create property requestID as xs: string fixed 
queue crm, customer value //requestID 

create slicing requestMsgs on requestID 

create rule joinOrder for requestMsgs 
if(qs;slice()[/ customer Info Result] and 
qs: slice () [/ restrict ionsResult] and 
qs:slice()[/ capacity Result ] ) then 
if (qs: slice () [/ customerlnfoRcsult/ accept] and 

not (qs: slice () [/ restrictionsRcsult/ / restrictedltcm] ) 

and qs : slice ()[/ capacityResult //accept ] ) then 

let Srequest :— qs:queue("crm" )/offerRequest 

let $itcms: — $request [/ /requestID — qs : slicekey ()]/ items 

let $pricelist :— collection(" crm" ) [/ pr i c c lis t ] 

let Softer :— ... 

return do enqueue $offer into customer 
else (: problems : ) 

do enqueue <ref usal>{//requestID}</refusal> 
into customer 



Figure 7: Control flow synchronization 

messages from the slice of the current message. The key 
(property value) of the current slice can be retrieved us- 
ing the qs : slicekey function. Both of these functions 
are only available to rules defined on slicings, so that they 
are not ambiguous for messages belonging to more than one 
slice. 

Example 3.3 In our business process (depicted in Fig. [Sj, 
the decision whether to send or refuse an offer depends on 
the results of three preceding checks (credit rating, export 
restrictions and plant capacity), which may run in parallel. 
In this example, we use a slicing to join the control fiow of 
the parallel checks before processing continues (Fig. (Tjl. It 
includes all messages in the crm and customer queues which 
refer to the given requestID. 

The rule in Fig. [7]checks for the arrival of a message in the 
slicing and — if all three preceding checks have been per- 
formed — completes the order request by sending a reply to 
the customer. The qs:slice() function is used to acquire 
the confirmation messages. The qs: slicekey () function 
retrieves the slice key of the current slice for message corre- 
lation with the customer's offer request. Master data (such 
as price lists) stored in the non-messaging parts of Demaq is 
acquired using the standard XQuery collection() function. 

3. 5. 3 Slice Resets and Message Retention 

To indicate that the messages in a slice are no longer 
needed, Demaq QML provides a reset update primitive 
that resets a slice specified by a slicing name and a slice key. 
(If no parameters are given, it resets the slice of the current 



message with respect to the shcing to which the current rule 
is defined.) 

create rule clcanupRcqucst for rcqucstMsgs 

if (qs: slice ()/offcr or qs: slice ()/rcfusal) then 
do reset 

Figure 8: Resetting a slice 

In the example above, messages of the requestMsgs slice 
are visible until an offer or a refusal message with the slice- 
specific key is finally sent, which completes processing for 
this requestlD. This causes the "cleanupRequest" rule to 
reset the current slice (Fig. [8]). Hence, as far as these rules 
are concerned, the messages may be deleted after an order 
request has been processed completely. However, there may 
be other slices which cause the messages to be retained for 
longer. Further, note that the slice reset is specified in a 
separate rule, which causes a reset of the slice, no matter 
which rule caused an offer or a refusal to be created. 

When receiving an order, a number of subsequent steps 
is preformed, including sending order messages to external 
suppliers, arranging shipment and sending an invoice to the 
customer until the business process is successfully termi- 
nated by retrieving the payment (Fig. |3}. We skip these 
parts of the process, as they do not provide any additional 
insight into QML. However, if the payment has not been 
received within a particular grace period, a reminder has to 
be sent to the customer. 

Example 3.4 The code snippet in Fig. |9] shows another 
example how slicings can be applied to make sure messages 
are retained. Here, a QML rule mointors the reception of 
the timeout notification message previously registered at an 
echo queue (when sending the invoice to the customer) . If no 
payment has been received upon expiration of the timeout, it 
sends a reminder message to the customer. A slicing is used 
to make sure both invoice and payment confirmation are re- 
tained until the timeout message is received. If the timeout 
message arrives and the payment has been confirmed, the 
slice is refreshed by the "resetPayedlnvoices" rule. 

3.6 Error Handling 

An important aspect of any computer program is its re- 
action to faults. As the Demaq engine is supposed to run 



create property inessageRequestlD as xs: string fixed 
queue invoices, finance value //rcqucstID 

create slicing invoiccRetention on mcssagcRcqucstID 

create rule resetPayedlnvoices for invoiccRetention 
if (qs: slice ()// timcoutNotification 

and qs : slice /paymentConfirmation) then 

do reset 

create rule chcekPayment for finance 

if (//timcoutNotification) then 

let $mRID :— qs:message() //rcqucstID 

let Spayments ;— qs:queue() [/paymentConfirmation] 

return 

if (not($paymcnts[//rcqucstID — mRID] ) ) then 

let $invoicc:—qs:queue(" invoices" ) [//rcqucstID — mRID] 
let Srcminder ;— (; access initial invoice:) ... 
return do enqueue $reminder into customer 

else 

Figure 9: Message retention 



permanently to guarantee high service availability, it must 
be equipped with powerful error handling facilities. In this 
section, we review some sources of errors, and illustrate how 
they can be handled in Demaq. 

Application program related Some errors are caused by 
the application program itself. While syntactical er- 
rors or wrong static typing can be detected at compile 
time and can be handled while developing the applica- 
tion, there are also many application-related runtime 
errors. For example, as we use XQuery in our rule 
specifications, processing might raise one of the var- 
ious runtime errors defined by the XQuery standard, 
which are mainly related to dynamic typing issues. 

Message related IVIessage related errors may occur when 
trying to enqueue invalid XML documents received 
from remote peers into a gateway queue. For example, 
input documents may be truncated or not well-formed, 
thus resulting in parsing errors during processing, or 
rules create messages whose schema is incompatible 
with the target queue's schema. 

Network related Demaq provides application developers 
with a queue-based interface for sending messages over 
the network by simply enqueuing them into a cor- 
responding gateway queue. Unfortunately, the exis- 
tence of the network cannot be made completely trans- 
parent to application programs. Interactions with re- 
mote transport endpoints may encounter a variety of 
network-related issues [33] . Among these are low-level 
network problems such as temporal or permanent un- 
availability of remote transport endpoints, name res- 
olution failures, timeouts or routing errors. Further 
up the protocol stack, there may be other sources of 
problems. For example, using the WS-Security SOAP 
enhancements [4] may result in encountering invalid 
certificates, wrong signatures or decryption failures. 

Some of those errors can be resolved by the communi- 
cation subsystem (message delivery can be automati- 
cally retried etc). However, in many cases, application 
programs must handle such errors explicitly. 

System Another source of errors is the Demaq processing 
system itself. System failures may occur due to in- 
sufficient system resources (such as main memory or 
secondary storage), infrastructure problems (e.g. in 
the underlying database or operating system) or even 
hardware defects. 

To deal with these various kinds of errors, corresponding 
error handling facilities have to be provided for application 
developers. BPEL [3], XL [17] and many other program- 
ming languages (including e.g. Java and C-|— 1-) allow the 
specification of scoped exception handlers for those parts of 
the program that might potentially fail. Extending our ex- 
pression language XQuery with exception handling would 
jeopardize most of the benefits discussed in Sec. 13.21 such as 
reusing existing XQuery implementations and risk incom- 
patibility with future versions of XQuery, which might in- 
corporate similar mechanisms. 

Instead, like all other events in the Demaq system, errors 
are represented by XML messages sent to error queues. If 
an error is encountered during processing, a corresponding 



failure message is inserted into an error queue. The error 
message not only contains an error specification according 
to a predefined schema, but may also contain (a reference 
to) the data which caused the error, such as message IDs 
or corrupt incoming message bodies. Different error queues 
can be specified at the rule, queue, module and system level. 

This way, error handling is integrated seamlessly with our 
message and queue-based programming paradigm, reflect- 
ing the fact that in many cases, the border between regular 
application logic and error handling is quite fuzzy. Further, 
we can perform advanced error handling by reusing our reg- 
ular primitives. For example, error queues may be gateway 
queues, notifying remote operators. Error queues may be 
persistent, guaranteeing eventual reaction to an error even 
in the case of masking higher level (e.g. system) failures. 

Example 3.5 The example in Fig. 1101 shows how error 
handling can be done by a Demaq application. When re- 
ceiving an customerOrder request, the application rule tries 
to send back a confirmation message to the customer. All 
errors that might be caused by the confirmOrder rule are 
handled by the crmErrors queue. In this example, an error 
handing rule (deadLink) tracks all communication failures 
caused by disconnected transport endpoints. It compensates 
for the network failures by sending a mail confirmation using 
a corresponding Web Service (connected by the postalSer- 
vice gateway queue). 

create queue crmErrors kind basic mode persistent 
create property orderlD as xs: integer 

queue crm value //customerOrder/orderlD 
create slicing retainOrders on orderlD 

create rule confirmOrder for crm errorqueue crmErrors 
if (//customerOrder) then ( : send confirmation:) 
let Sconfirmation :— <confirmation> 

{//orderlD} (: additional details:) 
</confirmation> 
return do enqueue ^confirmation into customer 

create rule deadLink for crmErrors 

if (/crror/disconnectedTransport) then 
(:send confirmation via snail mail:) 
let Sorders :— qs :queue("crm" )//customerOrders 
let SinitialOrderlD :— /error/initialMessage//orderID 
let $address :— $ordcrs [orderlD=$initialOrderlD] /address 
let $rcquest :— <scndMcssagc>{$address} 

{/ /initialMessage}</scndMcssagc> 
return do enqueue SrequestMail into postalService 



Figure 10: Error handling 



4. IMPLEMENTATION ASPECTS 

Within this section, we outline some aspects of the ongo- 
ing implementation of Demaq and discuss how certain as- 
pects of the semantics of our declarative language can be 
used to optimize execution performance Q 

4.1 Transactional Queues 

Message-oriented middleware solutions successfully rely 
on transactional queues. Due to their functional overlap 
with typical database features such as recovery and secu- 
rity, it has been argued that queues should be integrated 

^Additional information about the Demaq project is avail- 
able at http://www.demaq.net/ 



into database systems [20] . Many commercial database ven- 
dors have considered incorporating queues into their prod- 
ucts [15) or already support message queues eis native data 
structures [191(34] . 

We believe that a queue-enabled XML data store is the 
most suitable foundation for the storage subsystem of an ac- 
tive XML message processing system such as Demaq. The 
current implementation of the Demaq message store is built 
on the foundation of Natix ]16] . a native XML data store 
that is available as a C++ library. While Natix allows for 
the efficient, reliable and persistent storage of XML frag- 
ments, we had to extend the existing collection-based stor- 
age and recovery subsystems with recoverable queues for 
XML message storage Q Since the XML queues and the 
XML collections have similar storage formats, we can lever- 
age the Natix run-time system for rule execution. This in- 
cludes the Natix virtual machine for query evaluation, the 
recovery and locking subsystems as well as the schema man- 
agement components. 

Having a fuU-fiedged XDS in source form, we can also 
apply existing database and transaction processing research 
to message processing. For example, our append-only ap- 
proach for message queues simplifies logging and recovery 
because there are fewer in-place updates. Further, our declar- 
ative mechanism for specifying message retention (Sec. 12. 3^ 
frees the system from the need to fully log message deletions 
- after a crash, the decision to delete certain messages can 
be reached without analyzing the log. 

However, Natix is not the only option as underlying mes- 
sage store. The Demaq design is highly modular and could 
also be implemented on top of other existing, queue-enabled 
database systems, such as the Microsoft SQL Server Service 
Broker (MSSSB) [3l]. MSSSB includes some of the neces- 
sary features, such as data retention and security, and even 
incorporates queue-based system facilities closely related to 
our gateway and echo queues. While MSSSB already allows 
to create messaging-based applications, it uses the TSQL 
programming language, which is imperative at the top level 
and requires much tedious glue code to work with XML 
data. The "activation" service that can be used to auto- 
matically run programs on message arrival does not include 
declarative conditions and hence complicates automatic op- 
timizations. However, it could be used to trigger Demaq's 
rule processing and scheduling. 

4.2 Gateway Queues 

To interface with remote Web Services, Demaq includes 
a communication subsystem, ofi'ering both synchronous and 
asynchronous communication channels to application rules 
in form of gateway queues. For synchronous calls, system 
properties are used to correlate request and reply messages. 
Demaq provides SOAP bindings to transport protocols such 
as HTTP and SMTP and the underlying TCP/IP chent- 
server functionality. 

4.3 Slicing 

While slicing can be implemented by merging the slice def- 
inition into the rules (see below), this would require to eval- 
uate a complex query for every incoming message. Instead, 
similar to the materialized views concept in RDBMSs, it is 

^The queue extensions (but not yet the Demaq 
Rule Engine) are included in Natix V2 available at 
http://db.informatik.uni-mannheim.de/natix.html.en 



possible to maintain a physical representation of the slices, 
for example using a B-Tree indexed by the slice key. 

Shces also present an opportunity for improving concur- 
rency, as they form a natural new granularity, coarser than 
messages, but orthogonal to queues. By locking just the 
affected slices, full serializability of the individual message- 
processing transactions can be guaranteed without locking 
whole queues. 

4.4 Rule Processing 

The rule-processing module of Demaq is responsible for 
executing the application logic defined by QML rules. Its 
main building blocks are a rule compiler and a scheduling 
component enforcing our execution model. 

4.4.1 Rule Compiler 

On deployment of an application, the rule compiler is used 
to compile the application's rule set into execution plans. 
Essentially, rule evaluation is a query against the state of 
the underlying storage engine that results in an update list. 
Hence, we can reuse execution plans and optimization tech- 
niques for XML query processing to perform rule evaluation. 

For each queue, the compiler collects all rules that are as- 
sociated with it. The bodies of the rules represent XQuery 
Update expressions, which are rewritten. Rewriting includes 
supplying default parameters to functions which depend on 
the current queue (such as qs: queue ()). Similar to conven- 
tional view merging, fixed properties (see Sec. 12. 2|) are in- 
lined, as is slice access that is not materialized (see above). 
After rewriting, the rule bodies are combined into a single 
query by concatenating all pending actions into a single se- 
quence. The query is then compiled into an execution plan 
that is executed every time a message arrives in that queue. 
A variety of existing techniques can be leveraged to improve 
processing performance, including XML filtering [14] . effi- 
cient expression evaluation [35], and template folding |23) . 

The technique outlined above creates a "canonical" execu- 
tion plan as a starting point. We intend to exploit the fully 
declarative nature of our language by performing cost-based 
optimization by rewriting rule sets into equivalent, but more 
efficient ones. 

4.4.2 Scheduler 

Demaq's scheduling component implements the execution 
model introduced in Sec l3.ll The scheduler maintains a list 
of all unprocessed messages and chooses the next message 
to be handled, considering both their temporal ordering and 
the priority of the containing queues. Thus, a message in 
a high priority queue may be processed before another one 
stored in a queue with a lower priority, even if it has been 
created more recently. 

The scheduler also orchestrates background tasks, such as 
gateway queue processing and the message garbage collec- 
tion for messages that do not have to be retained. 

5. CONCLUSION 

This paper discusses the Demaq approach for declarative 
XML message processing. Demaq may be used to specify 
the behavior of nodes in a variety of distributed applica- 
tion scenarios, including, but not limited to Web Service 
implementation and orchestration. The major feature of De- 
maq is a fully declarative language that is based on XQuery 
and has a simple semantics which facilitates optimization. 



While implementing the Demaq execution model on top of 
a queue-enabled DBMS such as our native XML data store 
Natix |16j . we found that we can benefit from the applica- 
tion of many existing techniques for transaction processing 
and declarative query processing to message processing. 

Much remains to be done. For example, time is an im- 
portant aspect for Active Web applications. While our or- 
dered queue model implies some relationship of messages 
with respect to time, and we have mentioned time-based 
echo queues in Sec l2.1.3l we have not explained in detail how 
time-based conditions are incorporated into our language, 
and how they are evaluated. Further, Demaq applications 
currently rely on a static set of queues, slicings, and rule 
definitions that cannot be adapted during system runtime. 
As a result, each time an application evolves, the processing 
system has to be shut down and restarted. Clearly, this is 
unacceptable for zero-downtime environments, bringing up 
the question how to allow for dynamic queue and rule evolu- 
tion, while still guaranteeing correct and reasonable system 
behavior. 
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